feat: enabled saving and evaluation for moderator (#271) #272

XuhuiZhou · 2025-01-07T19:45:29Z

feat: enable saving AgentProfile and chat history on redis
fix: AgentProfile can now be loaded by EpisodeLog correctly
feat: enable evaluation of EpisodeLog
[autofix.ci] apply automated fixes
fix: use EpisodeLog and AgentProfile from sotopia directly

Closes #

📑 Description

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

* api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <[email protected]> Co-authored-by: Zhe Su <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <[email protected]> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <[email protected]> Co-authored-by: openhands <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

…lationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <[email protected]>

* prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <[email protected]>

* initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <[email protected]>

* feat: enable saving AgentProfile and chat history on redis * fix: AgentProfile can now be loaded by EpisodeLog correctly * feat: enable evaluation of EpisodeLog * [autofix.ci] apply automated fixes * fix: use EpisodeLog and AgentProfile from sotopia directly --------- Co-authored-by: JXZhou <JXZhou> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

XuhuiZhou · 2025-01-07T20:09:26Z

sotopia/experimental/agents/moderator.py

-        if len(self.message_history) == 1:
-            self.message_history[0].append(
+        self.message_history.append(
+            [
                (
                    agent_action.agent_name,
                    "Environment",


@JXZhou0224 let's think this a bit more carefully, a messge should be (sender, receivers (seperated by comma), message content), so maybe we should not put environment here anymore

Certiainly! I will fix it. I set the receiver to "Enivronment" because in the render_for_human() method in EpisodeLog requires the message to be sent to "Environment" in order to render successfully. I think I will change it to "All" to represent sending to everyone and rewrite the render_for_human() method.

Ah I see, maybe that's fine if environment defaults to all.

sotopia/experimental/agents/moderator.py

XuhuiZhou · 2025-01-08T03:25:04Z

sotopia/experimental/agents/moderator.py

-                            last_turn=self.scenario,
+                            last_turn=json.dumps(
+                                {
+                                    "use_pk_value": self.use_pk_value,


What is this for?

The hyperparameter that determines "whether to use AgentProfile from exisiting database" is set in the moderator. Therefore, moderator will communicate this information to all Agents before the start of conversation.

Sounds good! It's okay if we design things that each agent initialize their own background. However, things could be tricky as the agent has no idea of other agents background and the general scenario. Unless we have a second step to make that happen.

Another easier way is to follow the original sotopia implementation:
Can you follow the reset here to write a similar version of initiating the simulation but allows more than two agents?

sotopia/sotopia/envs/parallel.py

Line 186 in d7724db

def reset(

XuhuiZhou · 2025-01-08T03:44:37Z

sotopia/experimental/agents/moderator.py

-        epilog.save()
-        # print(epilog.render_for_humans())
+        if self.push_to_db:
+            epilog.save()
        return epilog


Maybe we need to change this to evaluation queue or something

XuhuiZhou · 2025-01-08T18:17:18Z

examples/experimental/sotopia_original_replica/llm_agent_sotopia.py

+        self.model_name: str = model_name
+        self.agent_profile_pk: str = agent_pk
+        self.name: str = agent_name
+        self.background: dict = background


Note self.background is not used anywhere when generating the agent actions

XuhuiZhou · 2025-01-08T18:39:52Z

@JXZhou0224 I have changed the base from demo to main, please git merge main to update the branch and resolve conflicts.

Please also run uv run mypy --strict . to make sure there are no mypy errors.

XuhuiZhou and others added 24 commits December 4, 2024 20:58

fix ci error

cadf06d

solving pytests

187a21b

improve the tests

ec5c394

add custom eval fast api (#268)

1a1244e

fix mypy error

ae4014e

remove dev tag

70293aa

add custom eval

2526be1

base dimension

b0b53d8

fix ui mypy

22a1ecf

fix mypy

302835a

add delete dimension

0e44603

update streamlit ui

520a1dd

ignores the ui directory

5ffdee3

Committing changes before push

f9e2ea3

pytest for eval dimension

a45e440

fix mypy

24ca6a3

clean up comments

6b2db2a

XuhuiZhou marked this pull request as draft January 7, 2025 19:46

XuhuiZhou commented Jan 7, 2025

View reviewed changes

sotopia/experimental/agents/moderator.py Outdated Show resolved Hide resolved

back compatible with evaluators[draft]

bbf6061

XuhuiZhou commented Jan 8, 2025

View reviewed changes

add evaluation node

7558927

XuhuiZhou commented Jan 8, 2025

View reviewed changes

XuhuiZhou changed the base branch from demo to main January 8, 2025 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enabled saving and evaluation for moderator (#271) #272

feat: enabled saving and evaluation for moderator (#271) #272

XuhuiZhou commented Jan 7, 2025

XuhuiZhou Jan 7, 2025

JXZhou0224 Jan 8, 2025

XuhuiZhou Jan 8, 2025

XuhuiZhou Jan 8, 2025

JXZhou0224 Jan 8, 2025

XuhuiZhou Jan 8, 2025 •

edited

Loading

XuhuiZhou Jan 8, 2025

XuhuiZhou Jan 8, 2025 •

edited

Loading

XuhuiZhou commented Jan 8, 2025

feat: enabled saving and evaluation for moderator (#271) #272

Are you sure you want to change the base?

feat: enabled saving and evaluation for moderator (#271) #272

Conversation

XuhuiZhou commented Jan 7, 2025

📑 Description

✅ Checks

ℹ Additional Information

XuhuiZhou Jan 7, 2025

Choose a reason for hiding this comment

JXZhou0224 Jan 8, 2025

Choose a reason for hiding this comment

XuhuiZhou Jan 8, 2025

Choose a reason for hiding this comment

XuhuiZhou Jan 8, 2025

Choose a reason for hiding this comment

JXZhou0224 Jan 8, 2025

Choose a reason for hiding this comment

XuhuiZhou Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

XuhuiZhou Jan 8, 2025

Choose a reason for hiding this comment

XuhuiZhou Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

XuhuiZhou commented Jan 8, 2025

XuhuiZhou Jan 8, 2025 •

edited

Loading

XuhuiZhou Jan 8, 2025 •

edited

Loading